Missing Value Imputation in Time Series Using Top-k Case Matching
نویسندگان
چکیده
In this paper, we present a simple yet effective algorithm, called the Top-k Case Matching algorithm, for the imputation of missing values in streams of time series data that are similar to each other. The key idea of the algorithm is to look for the k situations in the historical data that are most similar to the current situation and to derive the missing value from the measured values at these k time points. To efficiently identify the top-k most similar historical situations, we adopt Fagin’s Threshold Algorithm, yielding an algorithm with sub-linear runtime complexity with high probability, and linear complexity in the worst case (excluding the initial sorting of the data, which is done only once). We provide the results of a first experimental evaluation using real-world meteorological data. Our algorithm achieves a high accuracy and is more accurate and efficient than two more complex state of the art solutions.
منابع مشابه
Missing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملContinuous Imputation of Missing Values in Streams of Pattern-Determining Time Series
Top-k Case Matching (TKCM). To impute a missing value in a time series s at time tn: 1. Define query pattern P (tn), spanning the values of d reference time series over a time frame of l time points anchored at time tn 2. Look for the k most similar non-overlapping patterns in a sliding window over the time series 3. Impute the missing value s(tn) as the average of the values of s at the anchor...
متن کاملچند رویکرد برخورد با مقادیر گمشده متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی بالینی
Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...
متن کاملCollateral missing value imputation: a new robust missing value estimation algorithm for microarray data
MOTIVATION Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algo...
متن کاملBIOINFORMATICS Collateral Missing Value Imputation: A New Robust Missing Value Estimation Algorithm For Microarray Data
Motivation: Microarray data is used in a range of application areas in biology, though often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible prior to using these algorithms. While many imputation algo...
متن کامل